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Abstract. This article presents a lemma in the spirit of the pumping lemma 
for indexed languages but easier to employ. 



Section 1. Introduction. 

The pumping lemma for context-free languages has been extended to stack 
languages [O] and indexed languages [H] , but these generalizations are rather 
complicated. In this article we take a slightly different approach by concen- 
trating only on that part of the context-free pumping lemma which says that 
if uvwxy G L, then uwy G L, and by employing a theorem on divisibility 
of words which is not used in [O] or [H]. Our result, Theorem A, is rela- 
tively easy to state and strong enough to verify the examples given in [H] 
of languages which are not indexed. On the other hand it does not afford a 
proof that the finiteness problem for indexed languages is solvable as does 
[H, Theorem 5.1]. 

Indexed languages were introduced by Aho [Al], [A2]. A brief introduction 
appears in [HU, Chapter 14]. Our original motivation for Theorem 1 was the 
investigation of finitely generated groups for which the language of words 
defining the identity is indexed. 

Section 2. A Result on Indexed Languages. 

Before stating our result we fix some notation. E is a finite aphabet, \w\ 
is the length of w & E*, and for each a G E, \w\a is the number of a's in w. 

Theorem A. Let L be an indexed language over E and m a positive integer. 
There is a constant k > such that each word w & L with \w\ > k can be 
written as a product w = wi ■ ■ - Wr for which the following conditions hold. 

(1) m < r < k. 

(2) The factors Wi are nonempty words. 
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(3) Each choice of m factors is included in a proper subproduct which 
Ues in L. 

By (3) we mean that the chosen factors occur in a product • • • e L 
with 1 < ii < . . . < it < r and m <t < r. The proof of Theorem A is given 
in the next section. 

Corollary 1. Let L be an indexed language. There is a constant k > such 
that ifw&L and \w\ > k, then there exists v & L with {l/k)\w\ < \v\ < \w\. 

Proof. Take m = 1 in Theorem A and choose a factor of maximum length. 
□ 

By taking m to be the number of letters in E and arguing similarly we obtain 
a result on the Parikh mapping. 

Corollary 2. Let L be an indexed language over E. There is a constant 
k > such that if w E L and \w\ > k, then there exists v E L with 
{l/k)\w\a < \v\a < \w\a for each a e E and \v\a < \w\a for some a e E. 

Corollary 1 has the following immediate consequence. 

Corollary 3. [H, Theorem 5.2] If f is a strictly increasing function on the 
positive integers, and L = {a-^^")} is an indexed language, then f = 0{k'^) 
for some positive integer k. 

Corollary 4. [H, Theorem 5.3] The language L — {{ab'^)'^ \ n > 1} is not 
indexed. 

Proof. Suppose L is indexed, and apply Theorem A to L with m — 1. Pick 
w = (a'^b)"' with n > k and consider the decomposition w = wi ■ • ■ Wr. As 
r < /c, at least one factor Wi must contain two or more a's. Choose that 
Wi to be in the proper subproduct v. But then v contains a subword ab'^a, 
which is impossible as v w. □ 

Section 3. Proof. 

The proof of Theorem A depends on a result about divibility of words. 
We say that v divides w and write v ^ tu if v is a subsequence of w. For 
example ac -< abc. By a theorem of Higman [SS, Theorem 6.1.2] every set of 
words defined over a finite alphabet and pairwise incomparable with respect 
to divisibility is finite. We will use this result in the following form. 

Lemma 1. Let m be a positive integer and Y a language over a finite 
alphabet A. Y contains a finite subset X with the property that for any 
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y & Y — X with m letters distinguished there is an x e X such that x ^ y 
and X includes all the distinguished letters of y. 

Proof. Let A' be the union of A with m pairwise disjoint copies of itself, 
and define Y' be the language of all words over A' which project to Y and 
contain exactly one letter from each of the m copies of A. By Higman's 
theorem X' , the set of all words in Y' each of which is not divisible by any 
word in Y' except itself, is finite. For any y' e Y' if we take x' to be a word 
of minimum length among all words in Y' dividing y', then x' e X'. Further 
x' contains all the letters of y' from A' — A. 

Define X to be the imion of the projection of X' to A* with the set of 
all words in Y of length less than m. Suppose that y E Y — X has m 
distinguished letters. Since \y\ > m, we can pick y' G Y' projecting to y so 
that the distinguished letters of y correspond to the letters of j/' in A' — A. 
By the preceding paragraph y' is divisible by an x' G X' which contains 
those letters. It follows that the projection of x' to S* is the desired word x. 
□ 

Notice that x might be a subsequence of y in more than one way. Lemma 1 
asserts only that there is some subsequence of y which includes the distin- 
guished letters and whose product is x. 

Fix an indexed language L over E, and let G be an indexed grammar for L. 
Let G have sentence symbol nonterminals and indices F. {NF* + S)* 
is the set of sentential forms. By [Al, Theorem 4.5] we may assume G is in 
normal form, i.e., 

(1) S does not appear on the righthand side on any production; 

(2) There are no e-productions except perhaps 5 — > e; 

(3) Each production has one of the forms A — > BC, Af B, A ^ Bf, 
OT A^ a, where A,B,C e N, f e F, and a G E. 

We are using the definition of indexed grammar from [HU] ; this definition is 
slightly different from the original. 

We write a A /3 to indicate that the sentential form /5 can be derived 
from the sentential form a via productions of G, and we use (3 ■ u> to denote 
the sentential form obtained by appending the index string u to the index 
string of every nonterminal in the sentential form It follows from the way 
derivations are defined in indexed grammars that if o; A then a-cu A p-uj. 
Conversely if a-u fi-uj and if every nonterminal occurring in that derivation 
has an index string with suffix a;, then a A /5. 

Lemma 2. Let m he a positive integer and Auj a sentential form in NF*. 
There is a finite set of sentential forms X C {N + with the property that 
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if Auj P e {N + E)* — X, and m symbols of (3 are distinguished, then 
there is a e X such that Auj a ^ /3, and a includes all the distinguished 
symbols of j3. 

Proof. Apply Lemma 1 to the language of all sentential forms in [N + 
derivable from Alo. □ 

Consider a derivation S ^ w E L, and let F be the corresponding deriva- 
tion tree. Let each vertex p of F have label A(p), and define a subtree F(p) 
with root p as follows. If X{p) is a terminal or nonterminal, then T(p) consists 
of p and all its descendants. Otherwise X{p) = Afuj for some nonterminal A, 
index /, and string of indices uj. In this case along each path emanating from 
p there will be a first vertex, perhaps a leaf of F, at which / is consumed. 
Define T{p) to be the union of all the paths from p up to and including these 
first vertices. The subtrees V{p) play an important role in [H]; we will use 
them here in a slightly difi'erent way than they are used there. 

Let 7(p) be the sentential form obtained by concatenating the labels of 
the leaves of V{p) in order; if p is a leaf, ^{p) — \{p)- Since T[p) is a subtree 
of a derivation tree, A(p) A 7(p). If \{p) = Afuj, then by construction all 
vertices of T{p) except its leaves have labels of the form Boj'fuj. The leaves 
are labelled by terminals or labels form Bcv. Deleting all the suffixes u yields 
a derivation tree for Af /3(p) where 7(p) = /3{p) -uj. Extend the definition 
of j3{p) to all other vertices p of F by defining j3{p) — j{p) when X{p) is a 
terminal or nonterminal. 

It follows from Lemma 2 that there is a finite set of sentential forms Z C 
(N + Ti)* such that for any of the finitely many sentential form Alo e NUNF 
if A(ju /3 G {N +Ti)* — Z and m symbols of /3 are distinguished, then there is 
Q! e Z such that Auj a ^ (3, and a includes all the distinguished symbols 
of (5. Since it does no harm to enlarge Z, we may assume Z contains all 
elements of {N + S)* of length at most m. 

Lemma 3. Let C > 2 be an upper bound for the lengths of elements of 
Z. Suppose (5{p) ^ Z but /3{q) e Z for all vertices q which are proper 
descendants ofp, then \/3{p)\ < C^. 

Proof. If p is a leaf, then \l3{p) | = 1. Suppose p has two descendants, gi , g2- It 
follows from the normal form for G that (3{p) — P{qi)(3{q2), and consequently 
\/3{p)\ < 2C. Finally if p has a single descendant, then the derivation 
X{p) A begins with application of a production of the form ^ — > a, 
Af ^ B or A ^ Bf. In the first case \(3{p)\ = \a\ = 1. In the second case 
X{p) must be Afuj whence (3{p) = B and again = 1. 
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Consider the last case. We have X{p) = Aoo and \{q) = Bfuj. Further 
(3{p) is the product of the terms I3{q') as q' ranges over the leaves of T{q). 
Since I3{q) G Z , there are at most C terms; and as each (3{q') e Z, we have 

\m\<c^- □ 

To complete the proof of Theorem A choose k = C"^ -\- 2 and suppose 
S ^ w G L with \w\ > k. Let F be the corresponding derivation tree and 
Po its root. Clearly /3(po) — w ^ Z, and so we may choose p to satisfy the 
hypothesis of Lemma 3. Note that (3{p) ^ Z implies \(3{p)\ > m; in particular 
p is not a leaf. 

If \{j>) = A, then = ai ■■ - at is a subword of w and m < t < . 
Consequently w — w'ai ■ ■ ■ atw" exhibits w as a product of more than m 
and at most k nonempty factors. Suppose m of the factors in this product 
are distinguished. If not all these factors are letters a^, distinguish more 
letters to bring the total of distinguished letters to m. By definition of 
Z there is a word u G Z such that A A u ^ ai - ■ ■ at and u contains all 
the distinguished letters of ai • • - at- It follows that v — w'uw" contains the 
distinguished factors of w and satisfies all the conditions of Theorem A. 

Finally \{p) = Afuj implies f3{p) = z^ - ■ -Zt with m < t < C'^ and each 
Zi e N U T,. Further j{p) = (3{p) ■ u). Consequently w = w'ui ■ ■ -Utw" 
where each Ui is the subword derived from Zi ■ lo in the derivation S ^ w. 
Because G is in normal form, none of the tt^'s is the empty word. As before 
there exists a ^ Z such that Af a ^ j3{p) and a contains all the ZiS for 
which Ui is distinguished. We have a ■ uj ^ u where u is the subproduct of 
ui - ■ - ut corresponding to the ZiS in a. It follows that v = w'uw" satisfies 
the conditions of Theorem A. 
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